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Introduction 


Many  techniques  have  been  considered  previously  for  target  discrimination  and 
tracking:  Kalman-filter  techniques,  model-based  schemes,  neural  network  ap¬ 
proaches,  classical  pattern  matching  and  matched  filter  methods,  dyadic  wavelet 
and  subband  methods,  and  so  on.  All  have  had  limited  success  in  terms  of 
achieving  fully  reliable  automated  discrimination.  When  targets  and  objects 
are  in  motion  in  an  ideal  noise- free  environment,  many  techniques  will  work 
well.  However,  under  more  realistic  conditions,  these  algorithms  encounter  dif¬ 
ficulties.  Tracking  an  object  when  many  other  objects  are  present  in  the  field 
can  be  challenging.  This  situation  can  occur  when  a  ballistic  missile  has  frag¬ 
mented,  The  target,  which  in  this  case  is  the  warhead,  could  be  surrounded  by 
tens  or  hundreds  of  fragments  of  similar  size.  Further  complicating  the  problem 
are  object  trajectories  that  cross  and  thus  introduce  periods  of  occlusion.  This 
can  cause  tracking  algorithms  to  derail.  In  addition,  the  image  environment 
in  which  these  algorithms  must  work  is  often  noisy.  Image  acquired  from  an 
on-board  optical  camera  located  in  an  interceptor  seeker  head  is  typically  noisy. 
Noise  has  been  the  “kiss  of  death”  for  conventional  motion  models  that  have 
been  developed  in  the  context  of  video  coding  and  motion  analysis,  such  as  block 
matching,  optical  flow,  and  pel-recursive  algorithms.  Finally,  useful  algorithms 
must  be  able  to  cope  with  rapidly  changing  motion  and  short  response-time 
constraints. 

This  project  is  aimed  at  developing  new  algorithms  for  tracking  missiles  and 
warheads.  The  initial  theoretical  foundation  for  this  effort  is  multiresolution 
analysis,  in  particular,  continuous  multidimensional  wavelet  theory.  In  concert 
with  this  framework,  we  are  aggressively  developing  new  classes  of  motion-model 
techniques  for  target  tracking,  in  support  of  this  project. 

Thus  far,  we  have  pursued  several  unique  approaches. 

First  we  have  developed  the  theory  for  an  algorithm  based  on  the  spatio- 
temporal  wavelet  transform  using  Galilean  wavelets.  This  work  is  de¬ 
scribed  in  detail  in  the  paper  “Spatio-Temporal  Wavelet  Transforms  for 
Motion  Tracking.”  It  will  appear  in  the  Proceedings  of  ICASSP97  in  May, 
and  is  included  in  Appendix  A. 

Second  we  have  develped  a  parallel  spatio-temporal  wavelet  approach  that  uses 
kinematical  wavelets.  The  transform  coefficients  are  different  in  this  case, 
resulting  in  a  different  complexity /performance  tradeoff.  Details  of  this 
approach  are  given  in  the  paper  “Spatio-Temporal  Continuous  Wavelets 
Applied  to  Missile  Warhead  Detection  and  Tracking.”  This  paper  will  be 
presented  at  the  SPIE  VCIP  Conference  in  February.  A  copy  is  included 
in  Appendix  B. 

Third  we  have  developed  a  new  affine-motion  model  for  estimating  motion 
and  used  it  to  develop  a  model-based  method  for  noise  suppression  to 
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facilitate  target  tracking.  Thus  far,  this  algorithm  seems  to  work  very 
well  under  a  broad  set  of  conditions.  It  is  described  in  the  paper  “Sensor 
Data  Enhancement  of  Ballistic  Missile  Warheads  Using  an  Affine  Motion 
Model.”  A  copy  of  this  paper  is  included  in  Appendix  C. 

Fourth  we  are  combining  some  of  the  advances  we  made  with  the  noise  sup¬ 
pression  algorithm  (appendix  G)  with  the  wavelet  tracking  approach.  A 
discussion  of  this  work  in  progress  is  given  in  Appendix  D. 

Fifth  we  have  examined  a  new  approach  to  motion  estimation  which  employs 
the  explicit  use  of  linear  prediction.  The  linear  prediction  algorithm  was 
developed  in  a  Ph.D.  thesis  by  Robbie  Armitano.  A  discussion  of  the 
experimental  evaluation  is  given  in  Appendix  E.  The  technique  works  well 
under  mild  noise  conditions.  It  has  significant  advantages  over  block- 
based  approaches  in  this  regard.  However,  under  the  moderate  and  severe 
noise  conditions  we  used  in  our  test  sequences,  the  algorithm  was  not 
successfull  in  tracking  the  motion.  Nonetheless,  we  found  this  investigation 
interesting.  It  could  be  very  useful  for  sensors  with  mild  noise  because  it 
works  under  these  conditions  and  the  complexity  is  comparable  to  that  of 
conventional  block  matching  algorithms. 

Sixth  we  examined  a  simple  approach  at  the  beginning  of  the  year,  which 
involved  using  simple  filtering  and  a  Fourier  feature  representation.  This 
work,  although  small  in  scope,  gives  us  some  insight  into  the  performance 
complexity  tradeoff.  It  is  very  simple  from  a  complexity  perspective,  yet 
its  results  are  reasonable.  We  are  presently  doing  some  modifications  and 
comparisons.  A  discussion  of  this  effort  will  be  given  in  the  final  project 
report,  after  comparative  testing  has  been  completed. 
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Appendix  A 


SPATIO-TEMPORAL  WAVELET  TRANSFORMS  FOR  MOTION  TRACKING.  * 


Jean- Pierre  Leduc  Fernando  Mujica  Romain  Murenzi  Mark  Smith 
Georgia  Institute  of  Technology 
Center  of  Signal  and  Image  Processing 
Clark  Atlanta  University 

Center  for  Theoretical  Studies  of  PbysicaJ  Systems 
Atlanta,  Georgia 


ABSTRACT 

This  paper  addresses  the  problem  of  detecting  and  tracking 
moving  objects  in  digital  image  sequences.  The  main  goal 
is  to  detect  and  select  mobile  objects  in  a  scene,  construct 
the  trajectories,  and  eventually  reconstruct  the  target  ob¬ 
jects  or  their  signatures.  It  is  assumed  that  the  image  se¬ 
quences  are  acquired  from  imaging  sensors.  The  method 
is  based  on  spatio-temporal  continuous  wavelet  transforms, 
discrethed  for  digital  signal  analysis.  It  turns  out  that  the 
wavelet  transform  can  be  used  efficiently  in  a  Kalman  fil¬ 
tering  framework  to  perform  detection  and  tracking.  Sev¬ 
eral  families  of  wavelets  are  considered  for  motion  analysis 
according  to  the  specific  spatio-temporal  transformation. 
Their  construction  is  based  on  mechanical  parameters  de¬ 
scribing  uniform  motion,  translation,  rotation,  acceleration, 
and  deformation.  The  main  idea  is  that  each  kind  of  motion 
generates  a  specific  signal  transformation,  which  is  analyzed 
by  a  suitable  family  of  continuous  wavelets.  The  analysis 
is  therefore  associated  with  a  set  of  operators  that  describe 
the  signal  transformations  at  hand.  These  operators  are 
then  associated  with  a  set  of  selectivity  criteria.  This  leads 
to  a  set  of  filters  that  are  tuned  to  the  moving  objects  of 
interest. 


1.  INTRODUCTION 

The  primary  purpose  of  the  present  work  is  to  investigate 
families  of  spatio-temporal  continuous  wavelet  transforms 
(CWT),  and  to  investigate  their  utility  for  motion  tracking 
and  trajectory  constructions.  The  approach  considered  in 
this  paper  differs  fundamentally  from  other  techniques  that 
have  been  proposed  such  as  those  based  on  optical  flow, 
pel-recursive,  block  matching  and  Bayesian  models.  The 
main  novelty  of  this  method  is  it  combines  the  CWT  with 
Kalman  filtering  for  tracking.  Several  families  of  CWTs  can 
efficiently  perform  various  tasks  like  motion-based  detection 
and  segmentation,  selective  tracking  and  reconstruction  of 
objects  in  motion.  The  CWT  is  also  highly  robust  against 
sensor  noise.  Moreover,  rt  is  able  to  handle  temporary  oc¬ 
clusions  resulting  from  crossing  trajectories.  These  proper¬ 
ties  are  generally  not  found  in  techniques  rooted  in  optical 
flow  and  block-based  motion  estimation. 

The  study  of  CWTs  originally  evolved  from  considering 
spatio-temporal  affine  transformations.  These  were  easily 
amenable  to  Lie  group  structures  and  admissible  wavelet 
representations.  The  approach  turned  out  to  be  efficient 

’This  material  is  based  upon  work  supported  in  part  by  sup¬ 
ported  partly  by  the  U.S.  Army  Research  Office  under  grants 
DAAH-04-96-1-0161  and  DAAH-04-95-1-0650,  and  in  part  by  a 
Belgian  NATO  fellowship. 


for  signal  analysis  and  enabled  the  introduction  of  numer¬ 
ous  physical  parameters  as  criteria  of  selectivity.  The  im¬ 
portance  of  CWTs  in  this  field  was  recognized  several  years 
ago  [1).  Although  the  analysis  of  image  sequences  requires 
numerous  analyzing  parameters,  only  a  small  subset  of  them 
has  to  be  considered  simultaneously  in  each  specific  appli¬ 
cation.  The  most  significant  components  of  uniform  motion 
are  studied  in  this  paper,  i.e.  the  translation,  the  rotation, 
the  deformation,  and  the  acceleration.  The  spatial  orien¬ 
tation  (the  preferential  axis  of  inertia)  and  the  scale  arc 
additional  parameters  of  concern;  indeed,  the  scale  is  in¬ 
trinsic  to  any  wavelet  analysis.  The  application  of  motion 
tracking  is  addressed  in  this  paper  and  is  illustrated  with 
CWTs  tuned  to  the  velocity,  i.e.  the  translational  motion, 
It  is  assumed  that  local  motion  is  linear.  Hence,  the  tech¬ 
nique  applies  whenever  the  approximation  is  valid  locally 
on  a  few  frames  (3  or  more).  CWTs  that  are  tuned  to  ve¬ 
locities  are  called  Galilean  wavelets  to  refer  to  the  Galilei 
group  that  describes  classical  mechanics. 

2.  BUILDING  FAMILIES  OF  CWTS 

The  construction  of  CWTs  relies  on  signal  transformations 
that  model  motion  and  object  deformations.  They  can  take 
into  account  translation,  rotation,  scale  and  shear.  One  idea 
developed  in  this  work,  consists  of  expressing  all  these  ele¬ 
mentary  transformations  as  unitary  operators  in  the  spatio- 
temporal  domain  2D  4*  T  (2D  spatial  plus  time),  and  to 
write  useful  generalizations  for  uniform  motion  (i.e.  motion 
described  by  time-invariant  parameters),  namely  transla¬ 
tion ,  rotation)  deformation ,  and  acceleration .  These  uni¬ 
tary  operators  and  their  related  parameters  are  eventually 
combined  to  form  a  general  law  of  signal  transformation. 
This  transformation  is  intended  to  be  applied  either  to  the 
signal  or  to  the  mother  wavelets.  When  applied  to  the 
signal,  it  describes  the  transformations  performed  by  the 
motion  i.e.  warpings  of  the  signal.  When  applied  to  an 
admissible  mother  wavelet,  it  generates  a  whole  family  of 
continuous  spatio-temporal  wavelets  (band  pass  filters  for 
signal  analysis).  The  construction  of  these  families  is  ruled 
by  locally  compact  groups  (Lie  groups)  and  the  admissi¬ 
bility  of  a  mother  wavelet  is  enforced  by  three  representa¬ 
tion  properties:  square-integrability,  irreducibility  and  uni- 
tarity.  The  procedure  to  calculate  admissible  wavelets  is 
well-known  and  relates  to  that  of  the  coherent  states  orig¬ 
inating  from  theoretical  physics.  As  such,  it  has  already 
been  developed  in  other  research  work  in  the  one  and  multi¬ 
dimensional  cases  [2],  The  construction  proceeds  as  follows. 
Unitary  spatio-temporal  transformations  are  first  defined 
on  the  space  (R2  x  R).  Typically,  in  this  construction,  the 
structure  of  the  parameters  leads  to  composition  relation¬ 
ships,  inverse  and  identity  that  characterize  a  group.  The 


study  of  group  representations  in  spatio-temporal  Hilbert 
spaces  comes  thereafter.  According  to  the  theory  of  co¬ 
herent  states,  the  demonstration  of  unitarity,  irreducibil- 
ity  and  square-integrability  of  these  representations  guaran¬ 
tees  the  existence  of  admissible  continuous  spatio-temporal 
wavelets.  Practically,  this  means  that  operating  the  mother 
wavelet  in  the  space  of  the  parameter  definition  covers  the 
whole  set  of  bandpass  filters  of  finite  energy,  while  preserv¬ 
ing  all  the  well-known  wavelet  properties  (isometry,  inver¬ 
sion,  reproducing  kernel,  and  resolution  of  the  identity).  In 
this  section,  five  different  constructions  of  CWT  families 
will  be  considered  as  examples.  They  originate  from  the 
groups  covering  sdl  the  motions  [3]  and  represent  the  action 
of  Lie  algebras  and  groups  on  manifolds.  First,  operators 
on  wavelet  and  signal  {H  :  L2(R2  x  R)}  -4  L2( R2  x  R)} 
will  be  defined  with  their  respective  set  of  parameters. 

The  affine- Galilei  group  supports  the  construction  of 
CWTs  tuned  to  velocity  and  uniform  translational  motions. 
The  set  of  operators  and  parameters  involved  in  this  CWT 
are  the  spatio-temporal  translation  of  parameter  b  and  r  to 
represent  the  space  and  time  locations,  the  velocity  u,  the 
dilation  a  to  represent  the  scale,  and  the  spatial  rotation 
8  to  represent  the  orientation  (the  preferential  anisotropy). 
The  action  of  these  parameters  can  be  wTitten  as  the  fol¬ 
lowing  spatio-temporal  transformation 

*2  =  -R{-8){xi  -  b  -  vl)\  =  (1) 

a 

where  R(8)  is  the  rotation  matrix  in  SO(2).  Let  us  write 
the  wavelet,  transform,  ty(£, f),  in  the  Galilean  family.  In 
the  spatio-temporal  domain,  wc  have 

[fi(6,  T,iJ1a,0)'I']  (x,t) 

=  i*  [**(-*)  (x  -  6- vf)  ,  t-r]  ,  U 

and  in  the  Fourier  domain,  where  k  and  u  stand  for  spatial 
and  temporal  frequencies,  we  have 


n(6,r,u,a,0)5j  (£,<j) 

=  a  e-1  (fJ  +  u'r)  $  [fi(-fl)  ak,  w  +  kv] 


(3) 


The  set  of  parameters  considered  in  this  family  of  CWTs  is 
(6,  r,  u,  c,  8).  These  CWTs  are  called  Galilean  wavelets. 

A  slightly  different  approach  to  Galilean  wavelets,  called 
the  kinematical  wavelets ,  has  been  described  by  Duval- 
Destin  and  Murenzi  (1).  In  this  case,  the  set  of  parameters 
is  (6,  r,  0,0,0)  where  a  is  the  spatio-temporal  dilation,  and 
c  is  the  speed  parameter,  c  and  8  reach  the  velocity.  The 
spatio-temporal  transformation  is  given 


£2  =  -±rR(-6)(Si  -  b);  U  =  c—  (tx  -  r)  (4) 

C1' J  G  G 

Let  us  now  consider  uniform  rotational  motion  as  a  third 
spatio-temporal  transformation,  and  generate  a  family  of 
CWTs.  Uniform  rotational  motion  is  different  from  the 
spatial  rotation  of  the  S0(2)  group  in  the  sense  that  it  in¬ 
corporates  time  and  space.  The  resulting  velocity  is  given 
in  this  case  by 


v{t)  =  tJo  +  w  A  £(i)  ,  (5) 

where  u  is  the  angular  velocity,  u  is  the  translational  veloc¬ 
ity  and  £(<)  the  current  coordinate  location  of  the  moving 


object.  The  symbol  A  stands  for  the  cross  vector  product. 
Another  way  of  expressing  this  signal  transformation  in  the 
image  planes  is  given  as 


)]• 


(6) 

The  set  of  pa- 


£2  =  R(-9t)x\ ;  tn  =  tx  —  r 

[wo  =  ( SS 

rameters  considered  in  this  CWT  family  is  (6,r,a,{Jo.u3) 
or  (6,r,  a,tJo,0). 

Uniform  temporal  dilation  (i.e.  expansion  or  contrac¬ 
tion)  is  defined  by  substituting  in  Equation  (6)  R(8t)  by 


D(at)  = 


0 


- ) 


This  transformation  is  im¬ 


portant  since  any  object  in  motion  approaching  the  camera 
undergoes  rather  exponential  expansions  in  the  image  field. 
The  set  of  parameters  of  interest  for  the  CWT  construction 
are  then  (6,  r, a, t7o,  a). 

A  fifth  set  of  analyzing  parameters  would  consider  urn* 
form  acceleration  7,  given  by  the  second  order  coefficient 
when  expanding  the  trajectory  curve  £  =  f{t)  in  series 


x(t)  =  b  +  Hot  +  l-  70  f2  +  Y,  (n~+~2)!  ^  'n+:  (7) 

n=  1 

where  vq  =  is  the  velocity,  and  70  =  --~4r-  |f-0  is 

the  acceleration.  The  7„  stands  for  n^-order  acceleration 
and  is  not  considered  in  this  study.  Thus,  the  parameters 
of  interest  in  this  CWT  family  will  be  (6,  r,  a,  t>o,  70). 

3.  DEFINITION  OF  THE  CWT 

This  section  presents  the  definition  of  one  CWT  family,  the 
Galilean  wavelets.  The  signal  s(£,  t)  subject  to  analysis  is 
defined  in  the  Hilbert  space  L2(R2  x  R ,d2xdt).  The  CWT 
W[s;  6,  r,  {/,  a,  8]  is  defined  as  an  inner  product 


W  [j;  6,  t,  v,  a,  8]  =  c^1^2  <  T  0  a  e  I  s  ^ 

=  c~l/2  f  d2x  dt^SiT_.a  e  s(i,t) 

J  R3xR 

where  the  overbar  '  stands  for  the  complex  conjugate.  The 
wavelet,  is  a  mother  wavelet  It  must  satisfy  the  condi¬ 
tion  of  admissibility  (i.e.  of  square-integrability)  meaning 
that  there  exits  a  constant  c*  such  that 


C*  =  (27r)3  f  d'k  dux  <  00  . 

J  RJxR  \k\7 

A  numerically  efficient  way  of  performing  the  CWT  consists 
of  working  in  the  spectral  domain  by  means  of  the  (2D+T) 
FFT.  The  other  CWT  families  have  a  similar  definition. 


4.  EULER-LAGRANGE  EQUATION 

Let  us  consider  Lagrange’s  principle  of  the  least  action  that 
can  be  equivalently  derived  in  classical  mechanics  and  in 
optimal  control  from  the  calculus  of  variations.  The  system 
is  characterized  by  the  action  S  and  a  non-negative  defi¬ 
nite  function,  called  the  Lagrange  function,  L[£(t), x(t)\ f], 

where  £(t)  is  the  trajectory  and  i{t)  =  is  the  corre¬ 
sponding  velocity  function.  The  calculus  of  variations  al¬ 
lows  us  to  derive  the  motion  equation  and  the  trajectory 


that  optimize  the  action.  Usually,  motion  between  times  t j 
and  t7  in  a  conservative  mechanical  system  coincide  with 
the  extremal  of  the  functional 


5=  /  L[x(t)J.{t)'t}dt  , 


where  L  is  the  difference  between  the  kinetic  and  the  po¬ 
tential  energy.  Optimal  control  exploits  the  same  modeling, 
where  S'  is  a  cost  function  to  be  optimized  under  some  con¬ 
straints  to  be  specified.  The  trajectory  is  then  uniquely 
defined  when  the  initial  conditions  are  known  in  terms  of 
object  location  and  velocity  (detection  issue).  At  the  ex¬ 
tremum,  denoted  by  *,  the  calculus  derives  the  well-known 
Euler- Lagrange  equation 


d  dL  dL 

dt  g±‘  dx •  ~  0  ‘ 

Id  this  paper,  the  Lagrange  function  L  to  be  considered  is 
the  square  of  the  modulus  of  the  Galilean  CWT,  i.e.  the 
energy  density  |  <  |  a  >  |5,  E  =  f,  r’=  t  and 

v  =  x(t).  The  Cauchy-Schwarz  inequality  states  that 


characterized  by  two  equations,  a  state  equation  and  an  oh- 
servation  equation.  The  state  equation  is  an  adaptive  pre¬ 
dictor  that  updates  the  state  U(n)  of  the  filter 

U(n)  =  $(n,  n  -  3  )C/(n  —  1 )  +  W(n)  ,  (12) 

where  U(n)  is  the  state  prediction  at  step  n  and  IV(n)  the 
prediction  error.  <£  is  the  transition  matrix  or  the  feed¬ 
back  matrix  of  the  Kalman  filter.  If  the  state  is  well-chosen 
(i.e.  the  CWT  matches  the  signal),  the  predictor  behaves 
as  a  Markov  process,  and  the  prediction  error  is  a  zero- 
mean  Gaussian  process.  In  the  case  of  an  analysis  with 
Galilean  jwavelets,  the  state  parameters  are  composed  of 
the  set  (6,  r,  vy  a,  0)  and  the  prediction  step  n  is  the  image 
interval  For  other  CWTs  (like  the  accelerated  family),  the 
prediction  step  can  involve  several  images,  typically  tens  of 
them.  The  CWT  is  then  used  at  each  step  n  as  a  motion 
analyzer  to  determine  the  exact  state  values  of  the  Kalman 
filter  U(n ).  A  gradient  algorithm  works  in  the  neighborhood 
of  the  predicted  state  U(n)  to  locate  the  exact  state  U(n) 
composed  of  the  parameters  that  maximize  the  following 
energy  density 


*  [£»w]  s  (k}uj)  |2 

*  /r=xr^  to  I*  M  Is  /R,xR^  duj  |S  (k,u)  |J  , 

where  s(x,l)  is  a  band-limited  version  of  s(x,i)  with  one 
or  several  moments  equal  to  zero.  Then,  equality  proceeds 
if  ^(k,u/)  =  c  s  (£, u).  This  inequality  provides  some 
starting  conditions  for  the  wavelet  transform  to  perform 
matched  filtering  or  correlation.  The  analyzing  wavelet  has 
to  be  matched  to  the  object  with  respect  to  its  spectrum 
and  its  motion.  In  our  case,  the  unique  optimum  to  be 
tuned  must  correspond  to  the  trajectory.  This  enables  a 
stable  and  unambiguous  tracking  procedure.  This  impor¬ 
tant  property  must  then  be  analytically  demonstrated  for 
each  family  of  wavelets  when  applied  to  the  particular  mo¬ 
tion  under  investigation.  This  equation  and  all  its  related 
theory  remain  valid  in  our  case  and  interconnect  our  analy¬ 
sis  problem  not  only  to  the  theory  developed  for  mechanical 
systems  but  also  to  optimum  control.  The  equations  and 
the  algorithms  that  have  been  developed  to  recursively  con¬ 
struct  the  optimum  control,  apply  readily  to  this  problem. 
Let  us  mention  the  Kalman  filter  and  Bellman's  algorithm 
(Viterbi  algorithm). 

5.  DETECTION  AND  TRACKING 

The  detection  of  moving  objects  relies  on  extracting  local 
maxima  in  the  velocity  representation,  E  =  f{v,a), 


at 

I  <  >  I2  drd2b  (11) 


i.e.  from  the  energy  density  computed  by  integrating  the 
energy  of  the  CWT  over  the  space  and  the  length  of  the 
scene.  This  technique  effectively  characterizes  all  the  mov- 
ing  objects  and  the  velocities. 

The  tracking  strategy  is  based  on  combining  Kalman  fil¬ 
ters  and  CWTs.  The  state  of  the  Kalman  filter  is  composed 
of  all  the  wavelet  parameters.  Usually,  Kalman  filters  are 


MAX  E(b,  r,  v,  a,  8)  =  I  <  ®g,r.9.a,#|3  >  |2  .  (13) 

The  observation  equation  also  exploits  the  CWT  as  a 
motion-based  extraction  tool  tuned  to  the  current  exact 
state  parameters.  The  CWT  captures  and  isolates  the  se¬ 
lected  objects  from  the  scene  s  to  provide  a  display  /, 


/(n;6,r)  = 


6,Tlff=i7opf  ,c  =  a0 


f|s  >  +  VP(n;fa,r) 


I  IS  the  segmented  image  of  the  selected  object,  displayed 
alone  at  its  correct  location;  s  is  the  original  signal  under 
analysis,  and  V  is  the  noise  produced  by  the  optical  sensors. 


6.  MORLET  WAVELET  AND  APPLICATIONS 

The  applications  presented  in  this  paper  for  detection  and 
tracking  has  been  performed  with  the  Galilean  CWT.  An 
anisotropic  Morlet  wavelet  is  admissible  as  a  mother  wavelet 
in  the  Galilean  family;  it  defines  a  non-separable  filter 


=  e'*0^  e-^^  1  CA’>  _  e-}<*o  i  Dk0>  c-$<a’ica’> 


where  X  =  (x,t)T  6  Rn  x  R,  C  is  a  positive  def¬ 
inite  matrix  and,  D  =  C“‘.  For  2 D  +  T  signals, 

r  / 1/«*  o  o  m 


I u  ~  9  V£v  0  w'here  the  c  factors  intro- 

I  \  0  0  1/d  J 

duce  anisotropy  in  the  wavelet  shape.  Figures  1  and  2  show 
the  energy  density  of  the  Morlet  wavelet  in  the  Fourier  do- 

m^Dirineil0C'?'  v  =  O'P)-  A  high  selectivity  or  anisotropy 
€t  —  1000  has  been  applied  to  flatten  the  wavelet  along  the 
velocity  plane.  Figure  4  presents  the  issue  of  the  motion  de¬ 
tection  applied  to  the  synthetic  scene  displayed  in  Figure  3. 
Figures  5  and  6  present  the  tracking  of  one  accelerated  ob¬ 
ject  captured  out  of  five  others. 


7.  CONCLUSIONS 

Several  families  of  spatio-temporal  CWTs  have  been  pro- 
posed  in  this  paper  as  tools  to  analyze  spatio-temporal  sig¬ 
nals  with  respect  to  mechanical  criteria.  Among  them,  the 


Galilean  wavelet  transform  is  tuned  to  velocities  and  uni¬ 
form  translation  motion.  We  have  shown  how  that  CWT 
family  can  handle  detection  and  tracking  applications.  We 
believe,  at  this  point,  that  the  approaches  based  on  CWTs 
have  promise  in  the  area  of  motion  tracking.  Tracking  has 
also  been  shown  possible  even  under  severe  noise  conditions, 
and  even  when  occlusions  occur. 
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Figure  3.  synthetic  noisy  image  sequence. 
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Figure  4.  Velocity  detection  in  the  noisy  sequence: 
v=  (vi.vv)  =  (0,  .5),  (0,1)  (0.2),  (1,0). 
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Figure  1.  Galilean  wavelet  in  velocity  plane  (1,0).  Figure  5.  Selective  trajectory  construction  (remark:  the  upper 

bound  image  is  located  at  x  =  64). 


Figure  2.  Galilean  wavelet  in  velocity  plane  (1,0). 


Figure  6.  Selective  velocity  tracking. 
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ABSTRACT 


This  paper  addresses  the  problem  of  tracking  a  ballistic  missile  warhead.  In  this  scenario,  the  ballistic  missile  is 
assumed  to  be  fragmented  into  many  pieces.  The  goal  of  the  algorithm  presented  here  is  to  track  the  warhead  that 
is  among  the  fragments.  It  is  assumed  that  images  are  acquired  from  an  optical  sensor  located  in  the  interceptor 
nose  cone.  This  imagery  is  used  by  the  algorithm  to  steer  the  course  of  interception. 

The  algorithm  proposed  in  this  paper  is  based  on  continuous  spatio-temporal  wavelet  transforms  (CWTs), 
Two  different  energy  densities  of  the  CWT  are  used  to  perform  velocity  detection  and  filtering.  Additional  post¬ 
processing  is  applied  to  discriminate  among  objects  traveling  at  similar  velocities.  Particular  attention  is  given 
to  achieving  robust  performance  on  noisy  sensor  data  and  under  conditions  of  temporary  occlusions. 

First  we  introduce  the  spatio-temporal  CWT  and  stress  the  relationships  with  classical  orientation  filters. 
Then  we  describe  the  CWT-based  algorithm  for  target  tracking,  and  present  results  on  synthetically  generated 
sequences. 

Keywords',  Motion  estimation;  Target  detection  and  tracking;  Continuous  wavelet  transform 


1  INTRODUCTION 


One  of  the  more  difficult  problems  related  to  target  tracking  is  the  interception  of  ballistic  missiles.  In  this 
scenario,  an  optical  image  sensor  is  located  in  the  nose  cone  of  an  interceptor.  The  images  acquired  are  processed 
on-board  and  the  course  of  interception  is  determined.  Several  factors  complicate  the  problem.  First,  the  images 
received  are  noisy,  often  making  it  difficult  to  isolate  the  object  of  interest.  Second,  ballistic  missiles  often 
fragment,  resulting  in  a  cloud  of  many  objects  traveling  on  similar  trajectories.  Among  this  debris  is  the  target 
of  interest,  which  is  the  warhead.  Third,  it  is  possible  that  the  trajectories  of  missile  fragments  may  cross  during 


the  course  of  their  travel.  Thus  for  a  brief  period  of  time,  the  warhead  may  be  occluded  by  debris.  Finally,  both 
the  missile  and  the  interceptor  are  traveling  at  high  speeds.  Hence,  very  little  processing  delay  can  be  tolerated, 
since  decisions  regarding  course  changes  must  be  made  early  for  a  successful  interception  to  occur. 

The  specific  problem  we  address  in  this  paper  is  developing  an  algorithm  capable  of  tracking  a  target  in  this 
noisy  environment.  It  is  assumed  that  the  true  target  location  has  been  passed  to  the  interceptor  by  a  ground 
station.  The  algorithm  we  propose  to  estimate  object  trajectories  is  based  on  spatio-temporal  continuous  wavelet 
transforms  (CWTs),  and  attempts  to  exploit  the  multidimensional  nature  of  the  representation.  The  CWT  has 
the  attractive  property  that  it  is  parameterized  according  to  physically  meaningful  parameters,  namely,  location 
£,  time  £,  scale  a,  velocity  magnitude  c,  and  orientation  B .  Multiple  feature  spaces  can  be  derived  by  integrating 
over  a  subset  of  the  parameters.  The  structure  of  the  CWT  provides  a  natural  way  to  approach  the  tracking 
problem,  and  has  resulted  in  our  developing  a  three-stage  algorithm. 

One  feature  space  extracts  the  velocities  embedded  in  the  image  sequence.  This  is  called  the  velocity  detection 
stage.  In  the  next  stage,  a  second  representation  is  used  to  retain  those  objects  moving  at  a  prespecified  velocity 
and  to  reject  background  noise  and  other  objects  moving  at  different  velocities.  Finally,  post-processing  is 
performed  in  order  to  resolve  multiple  objects  moving  at  the  same  velocity. 

This  problem  is  largely  one  of  motion  estimation  and  analysis.  The  most  popular  techniques  of  this  type  are 
based  on  block  matching  and  optical  flow  schemes.  Block  matching  techniques  model  motion  in  terms  of  a  coarse 
block-translation  representation,  while  optical  flow  methods  rely  on  gradient  approximations.  These  techniques, 
in  general,  are  highly  sensitive  to  noise  (2,  4,  3],  and  cannot  handle  occlusions  in  a  natural  way. 

More  appropriate  for  this  application  is  a  filter /transform-based  approach  that  exploits  the  directional  orien¬ 
tation  characteristic  of  the  motion  in  both  wave-number1  /frequency  and  the  spatio-temporal  variables.  Clever 
selection  and  use  of  directional  filters  can  effectively  isolate  specific  object  motion.  This  approach  is  more  robust 
to  noise  and  can  naturally  handle  temporary  occlusions  resulting  from  crossing  trajectories. 

Methods  using  Gabor  filters  and  projection  filters  have  already  been  considered  as  techniques  to  capture  the 
directional  characteristic  of  the  motion.  However,  they  generally  do  not  provide  sufficient  velocity  resolution 
for  this  application  [2],  The  spatio-temporal  CWT,  however,  can  be  effective  in  this  regard.  We  have  obtained 
excellent  velocity  estimates  from  the  CWT  and  achieved  rejection  of  background  noise  as  part  of  the  process. 
Additionally,  the  spatio-temporal  domain  of  the  CWT  allows  the  interpolation  of  “missing”  information  when 
occlusions  occur. 

In  the  following  sections,  we  introduce  the  spatio-temporal  CWT  and  discuss  the  relationships  with  classical 
orientation  filters.  A  description  of  the  CWT-based  algorithm  for  target  tracking  is  presented  next,  and  supported 
by  experimental  results  on  synthetic  data. 


2  MOTION  ESTIMATION 


Motion  estimation  is  a  difficult  task,  especially  when  noisy  imagery  and  complex  motion  are  considered. 
Classical  algorithms  for  motion  estimation  axe  intrinsically  based  on  approximations  of  a  local  gradient,  which  is 
typically  calculated  from  a  pair  of  image  frames.  As  such,  these  techniques  are  highly  sensitive  to  noise  [2,  4,  3). 
Another  limitation  of  these  techniques  is  their  inability  to  handle  occlusions. 

To  overcome  this  problem,  a  filter /transform-based  approach  is  usually  taken.  Here  the  image  sequence  is 
considered  as  the  input  signal,  to  take  full  advantage  of  time  correlations.  Among  these  algorithms  are  Gabor  and 
projection  filtering  techniques.  Unfortunately,  these  filters  have  relatively  poor  velocity  resolution  [2,  4],  thus, 


1  Wave-number  in  this  context  refers  to  the  spatial  frequencies. 


they  are  not  appropriate  for  the  application  at  hand. 


The  idea  underlying  our  approach  is  to  process  at  once  as  many  frames  as  permitted  by  memory  and  process¬ 
ing  time  restrictions.  By  expanding  the  temporal  region  of  support  of  the  filters,  we  improve  robustness  against 
noise  and  enable  complex  motion  to  be  handled  easily.  In  addition,  “missing"  information  produced  by  crossing 
trajectories  can  be  interpolated,  resulting  in  improved  behavior  against  occlusions.  This  approach  has  the  draw¬ 
back  of  increasing  the  temporal  delay  of  the  overall  system,  which  is  of  primary  importance  in  the  scenario  of 
interest.  This  implies  a  tradeoff  between  accuracy  of  the  estimation  and  temporal  delay.  In  this  paper,  we  focus 
on  the  motion-estimation  accuracy, 


2.1  DEFINITIONS 

The  Hilbert  space  of  interest,  H,  corresponds  to  the  space  of  square  integrable  signals  over  time  and  space, 
that  is,  L2(1R2  x  3R,  cPx  dt).  The  corresponding  norm  is: 

IM|J=/  f  \s(x,t)\Wxdt.  (1) 

Jn*  Jm. 

To  distinguish  the  dimensions,  we  use  the  term  (2+l)D  to  signify  two  spatial  dimensions  plus  time.  The  Fourier 

transform  of  a  (2+l)D  signal,  s(x, t),  is  denoted  by  s(k,u>),  where  k  and  u>  are  the  wave-number  and  temporal 
frequency  respectively. 


2.2  MOTION  TRANSFORMATIONS  ON  THE  FOURIER  SPACE 


To  understand  how  motion  affects  the  wave-number/frequency  domain  consider  a  steady  signal 

s(x,  t)  =  s(£)  . 


The  Fourier  transform  can  be  expressed  as 


(2) 


s(k,u)  =  s{k)6{u)  . 


(3) 


The  energy  is  concentrated  in  the  plane  defined  by  u>  =  0.  Now,  consider  a  moving  version  (with  velocity  v)  of 
the  same  signal, 

s(x-vt,t),  (4) 


and  its  Fourier  transform, 


s(k,uj  +  k  ■  v)  . 


(5) 


The  energy  of  the  signal  is  now  concentrated  in  a  plane  defined  by  ui  —  —k  •  v  (i.e.  a  plane  perpendicular  to  the 
vector  v).  Such  a  plane  is  called  the  velocity  plane  associated  with  the  velocity  vector  v.  The  inclination  of  the 
resulting  velocity  plane  with  respect  to  the  horizontal  plane  (wave-number  plane,  kx  -  ky)  depends  only  on  the 
magnitude  of  the  velocity,  )v|,  while  the  inclination  with  respect  to  the  kx  -  w  plane  (or  ky  -  u  plane)  depends 
on  the  orientation  of  the  velocity  vector,  a  =  tan  ^{vy/vx).  Figure  1  shows  the  effect  of  constant  motion  on  a 
synthetic  signal  (Figure  la)  in  the  wave-number /frequency  domain.  Three  velocity  magnitudes  are  considered, 
|u|  =  0,1  and  2  in  Figures  lb,  c  and  d  respectively. 


It  is  clear  that  the  problem  of  velocity  detection  and  filtering  is  a  task  for  orientation  selective  filtering.  The 
spatio-temporal  CWT  can  be  seen  as  a  tool  for  designing  orientation  filters  in  an  elegant  way.  As  it  will  be  shown  in 
the  subsequent  sections,  the  parameters  of  the  wavelet  representation  are  directly  related  to  motion  transformation 
parameters.  These  parameters,  velocity  magnitude  and  orientation,  along  with  spatial  and  temporal  translations, 
and  scale,  define  the  basic  parameters  of  the  spatio-temporal  CWT. 
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intensity  variation)  ,  (b,  c  and  d)  DFT  domain  representation  for  |ti|  =  0.5, 1.0  and  2.0  respectively 


3  SPATIO-TEMPORAL  CWT 


s=a»-^rfiSSsSS 

the  original  and  in  the  transformed  domain,  are  briefly  described.  wavelet,  i/>,  both  m 


3.1 


TRANSFORMATIONS  APPLIED  TO  THE  WAVELET 


The  function  transformation  used  to  derive  the  set  of  bases  for  the  (2  +  1)D  CWT  are  as  follows  [5,  7], 


•  Spatial  and  temporal  translation:  The  wavelet  is  shifted 


to  a  given  point  in  space-time.  This  transformation 


(6) 

(?) 


is  denoted  by  T^btT^  and  is  defined  by 

(f,<)  =  i/>(£  —  6,£  -  r) 

[r<f'T^]  (fc,w)  =  e~j{l'k+0T^O k,w). 


•  Rotation:  This  transformation,  denoted  as  R9 ,  rotates  the  wavelet  on  the  spatial  coordinates  around  the 
frequency  axis.  In  this  way  the  filters  can  be  tuned  to  a  particular  orientation  associated  with  that  velocity. 
This  transformation  is  defined  by 


[Reip](x,t)  =  \p(r~9x,t)  , 

(3) 

[f?^]  (£,w)  =  ip{r~9k,u>)  , 

(9) 

= 

cos(0)  sin(0) 
-sin(0)  cos(0) 

(10) 

•  Scaling:  This  transformation,  denoted  by  is  standard  in  the  wavelet  framework  and  is  defined  by 

0  = 

a  a 

(11) 

Daifi j  (£,u)  = 

-  Q?t2i)(ak,ajbj)  . 

(12) 

•  Speed  tuning:  This  transformation  can  be  seen  as  two  independent  scaling  operations  performed  on  the 
spatial  and  temporal  variates.  This  allows  for  the  localization  of  the  wavelet  around  the  correct  inclination 
of  the  velocity  plane.  It  is  denoted  by  Ac  and  defined  by 

[Ac-0]  (f,  i)  =  t/»(c_1/3f,c2/3i)  ,  (13) 

AcV'](£,w)  =  ili(c1/3k,c~2/3u)  .  (14) 

Thus,  the  parameter  c  is  directly  associated  with  the  velocity  magnitude,  \v\.  This  parameter  was  introduced 
in  [5],  in  order  to  resolve  the  tradeoff  between  high  (low)  spatial  frequency  and  low  (high)  temporal  frequency 
for  slow  (fast)  moving  objects. 


These  operators  taken  all  together  transform  a  particular  signal  with  region  of  support  around  location 
(a,c,0,6,  r)  in  the  parameter  space,  to  a  new  location  (a/,c',0/,6,,r'),  provided  the  mother  wavelet  has  ap¬ 
propriate  support. 


3.2  DEFINITION 


The  spatio-temporal  CWT  of  a  (2+ 1)D  signal  is  a  mapping  from  finite  energy  signals  defined  on  (IR2  x  1R)  to 
finite  energy  signals  defined  on 

G  =  9  *  (a,  c,  0, 6,  r)  €  (IR+  x  IR+  x  [0, 2?r]  x  IR2  x  IR)  .  (15) 


Consider  the  composite  transformation  defined  by  the  application  of  all  the  operators  defined  in  the  previous 
section, 


TS’TR*AcDaip]  (f,f) 


(16) 


(17) 

(18) 
(19) 


=  a~3/2^(?73 -r-ff(£-6),~(f-r)) 


M](£w)  =  [r6'TJ?sAc2?v]  (if.w) 


=  a3/'2t/i(ac1/,3r''s£, — ^w)e-j(*'^+u’T)  . 

r-  .I  ^n’ence’  letus  define  (if,  tf)  =  m,*/1]  (x,t)  and  its  equivalent  Fourier  transform  to  be,  ip,  -  (k  u) 

M  (k' W)>  The  Spat5°-temP0ral  CWT  of  a  (2  +  DD  signal,  5,  with  respect  to  the  mother  wavelet,  Js  defined 


S^(a,c,9,b,T) 


r^(a,c,B,b,T)\s) 


fc^j  f  ^(a,c,e,l r)(Ii<)'s(*>*)d2f  1 


the  *-  *  - 


=  (2*)3  /  / 

JlR?  J  IF 


f  WMP 
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In  the  wave-number/frequency  domain,  (the  k/u  domain),  the  spatio-temporal  CWT  may  be  expressed  as 

S(a,c,6,b,r)  =  ~tf(o,ei,ir,r)|$)  .  (25 


Here  t/»  is  normalized  to  c0  _  1,  The  spatio-temporal  CWT  satisfies  Parseval’; 

wion  ^ 


s  condition  of  energy  conserva- 


rJU  \S{a,cyeXr)\7~~ded?bdT 


=  II5II2  =  INI2 


Thus,  E  =  0, 6,  r)|2  is  the  energy  density  in  the  CWT  domain. 

i***?  * — 

selective  filters  (i.e.  filters  with  regions  of  support  dose  to  a  particular  vdodtTptaT"^  tfe  CWt"  V''°C'? 
a  suitable  tool  for  velocity  detection  and  filtering.  Nevertheless  selection  of  mother  t  i  ?’ th  T  r,epresents 
since  usefulness  of  the  representations  depends  on  that  choice’  The  Mnth  her  wavelets  remains  a  key  issue, 
described  in  the  next  section.  The  M°ther  waveIet  used  in  the  simulations  is 


in  the  simulations  is 


3.3  MORLET  WAVELET 


The  spatio-temporal  extension  of  the  Morlet  wavelpf  is  a  cmnH  .  /»  , 

.ocatea  a)^^“ 


Moreover  an  anisotropy  parameter,  t ,  applied  to  the  spatial  variates  controls  the  variance  of  the  wavelet  with 
respect  to  the  reference  velocity  plane.  The  Morlet  wavelet  is  defined,  in  the  spatio-temporal  domain,  by 


’), 


(24) 


and  in  the  wave-number/frequency  domain  by, 

\p(k,u)  =  (e-il*-£°'3  -  e-i(l^l5+lf=l:,))(e_i(uJ“u'o)3  -  e_i<t’3+,j3>)  .  (25) 

Figure  2  shows  the  transformed  Morlet  wavelet,  7pa  g  g  T(fc,ui),  in  the  wave-number/frequency  domain.  The  three 
filters  shown  (two  views  of  each)  are  tuned  to  the  velocities  of  the  objects  from  Figure  1,  that  is,  for  c  =  0.5, 1.0 
and  2.0.  The  other  parameters  are  fix  as  follow,  8  =  0,£o  =  |6,0]',cuo  =  6  and  e  =  .25.  A  set  of  scales, 
a  =  {1,1.25,1.5,1.75,2},  is  chosen  to  cover  uniformly  a  given  velocity  plane.  As  it  can  be  seen  from  Figure  2, 


Figure  2:  Wave-number/frequency  representation  of  three  Morlet  wavelet  filters  with  parameters  8  =  0.  ka  = 
[6,0]',wo  =  6, €  =  0.25  and  a=  {1.0,1.25,1.5,1.75,2.0}.  (a)  c  =  0.5,  (b)  c  =  1.0,  (c)  c  =  2.0. 

the  region  of  support  of  the  filters  is  an  ellipsoidal  cone  concentrated  around  a  particular  velocity  plane.  Thus, 
velocity  detection  and  filtering  is  possible. 


3.4  ENERGY  DENSITIES 


The  multidimensional  nature  of  the  CWT  calls  for  a  reduction  of  the  variates  to  consider  in  the  visualization 
process.  This  can  be  done  by  either  fixing  a  subset  of  the  parameters,  or  by  integrating  over  a  subspace  of  the 
parameter  space.  The  second  approach  has  the  nice  property  that  it  can  result  in  invariant  representations  with 
respect  to  some  parameters.  Thus,  partial  integration  on  a  subset  of  the  parameters  in  the  energy  density  results 
in  different  energy  representations  that  can  be  used  to  extract  relevant  features.  Among  the  various  possibilities 
for  energy  representations,  two  are  of  particular  interest  for  object  tracking  and  detection  applications. 


•  Speed-orientation  energy  density:  Here  integration  is  performed  over  scale,  a,  spatial  translation,  6,  and 
temporal  translation,  t.  As  a  result,  we  obtain  the  following  energy  density, 


£,m>= /,/,/  ji< 

JjR12  Jjr  c  1  a 


(26) 


Space-time-velocity  energy  density:  In  this  representation,  the  speed  tuning  parameter  and  the  orientation 
are  fixed  (c  =  =  8i)  and  integration  over  the  scale  parameter,  a,  is  performed.  The  resulting  energy 

density  is  given  by, 
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tt,Ci  ,0;  ,b,T  I  ^ 
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(27) 


The  impact  of  these  two  energy  densities  on  the  proposed  detection  and  tracking  algorithm  is  discussed  in  the 
next  section. 


4  DETECTION  AND  TRACKING  ALGORITHM 


At  this  stage,  it  is  important  to  mention  that  a  computer  implementation  of  the  CWT  requires  a  discretization 
of  its  parameters.  Thus,  integration  in  the  formulas  described  in  the  previous  sections  should  be  replaced  by 
appropriate  summations.  In  this  section,  we  redefine  the  two  energy  representations  E\  and  E2  of  equations  (26) 
and  (27)  for  the  discrete  case. 

The  detection  and  tracking  algorithm  relies  on  the  multidimensional  nature  of  the  spatio-temporal  CWT  [5], 
Two  different  energy  density  representations  are  used  to  estimate  motion  parameters.  The  first  two  stages  of  the 
proposed  algorithm  are  depicted  in  Figure  3. 


Input 


Output  1 


Output  n 


Figure  3:  Object  Detection  and  Tracking  Algorithm. 


The  first  stage,  velocity  detection,  calculates  the  energy  density  in  the  speed-orientation  representation. 
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(28) 


Here  local  maxima  fo,^)  correspond  to  velocities  present  in  the  scene.  These  local  maxima  are  used  in  the  next 
stage,  velocity  filtering,  to  calculate  the  energy  density  in  the  space-time  representation  for  each  pair  (c<,0,), 


^2(b.T,ci,f)i)  =  ^^- 


OjCi 


<  V’  tf.  rT|s  > 

.0,r * 


(29) 


The  result  is  a  series  of  image  sequences,  each  containing  objects  moving  at  a  given  velocity.  Other  local  features, 
like  shape,  scale  and  orientation,  can  be  used  to  discern  between  objects  moving  at  the  same  velocity.  This 
algorithm  provides  a  means  for  motion-based  detection  and  tracking  in  a  self  contained  manner. 


5  RESULTS  AND  CONCLUSIONS 


Algorithm  development  and  testing  were  performed  using  synthetically  generated  sequences.  In  this  initial 
work,  we  restrict  the  analysis  to  linear  motion.  Figure  4a  shows  a  64  x  64  x  64  pixel  representation  of  four  moving 
objects.  The  velocities  of  each  of  the  four  objects  are  specified  in  the  following  table: 


Object 

Vx 

vV 

1 

0.5 

0 

2 

1 

0 

3 

2 

0 

4 

0 

1 

Figure  4:  Four-object  test  sequence  in  original  domain. 

Note  that  object  4  occludes  object  2  around  the  middle  of  the  sequence.  Gaussian  noise  2  is  added  to  the  sequence 
to  simulate  realistic  imagery.  Frame  number  20  of  the  noisy  sequence  is  shown  in  Figure  4b. 

The  result  of  the  velocity  detection  stage  (i.e.  energy  density  in  the  speed-rotation  representation)  is  shown 
in  Figure  5a,  The  local  maxima  are  represented  as  plus  signs  in  the  contour  plot.  These  local  maxima  are  used 
in  the  velocity  filtering  stage.  Frame  number  20  of  the  resulting  output  sequences  are  shown  in  Figure  5c. 

As  a  result,  each  sequence  contains  just  objects  moving  at  a  given  velocity,  and  the  background  noise  has  been 
reduced.  In  the  post-processing  stage,  thresholding  and  local  maxima  extraction  is  used  to  recover  the  trajectories 
of  all  four  objects.  Figure  5b  shows  the  trajectories  obtained.  Note  that  trajectories  are  correctly  interpolated 
when  occlusion  occurs.  It  is  worth  mentioning  that  even  though  no  explicit  motion  model  is  assumed,  trajectories 
are  accurately  found. 

At  this  point,  we  believe  the  CWT-based  method  has  promise  in  the  area  of  motion  tracking.  Velocities  were 
successfully  detected  and  velocity  filtering  was  demonstrated  in  a  natural  way,  leading  to  noise  reduction  in  the 
output.  TYacking  of  the  object  trajectories  was  shown  to  be  possible  even  under  severe  noise  conditions,  and  even 
when  occlusions  were  present.  Work  continues  in  the  CWT  development  aimed  at  addressing  complex  motion. 


2The  resulting  signal  has  a  SNR  of  10  dB  with  respect  to  the  original  signal. 


Figure  5:  Detection 
the  four  objects;  (c) 


and  tracking  algorithm  results,  (a)  Velocity  detection 
Frame  number  20  of  the  filtering  stage  result,  £2- 


stage  results,  Ei\  (b)  Trajectories  of 
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APPENDIX 


Hare  the  spatimtempora]  CWT  in  the  temporal  domain,  equation  (20),  is  caiculated  for  a  moving  impulse, 

a(x'-<X>-  (3C) 

A  simplified  version  of  the  Morlet  wavelet  is  used,  ft  >  .  and  w„  >  5  (5).  This  Morlet  wavelet  is  defined  by, 

l/>(z,  t)  =  e^i(l*|3+ta)e~;(fc0.£+Wet) 

After  applying  ail  the  operators  defined  by  equation  (19),  we  obtain  the  set  of  basis  functions  defined  by, 
fya, c,0,S,t)(5>O  =  a“3/2e“J^37T|£“l’|Je^ri(1-r)5e77^T*.T*(£-i) 


(31) 


(32) 


It  can  be  demonstrated  that  the  CWT  of  the  signal,  s,  is  given  by, 


,Cy$}b,TyV,k0  ,U/0) 


where  the  functions  f\  and  J2  are  defined  by, 


,  r  -  r  ,  |6|2  .  c4/3r2  (cV  +  g-v)2  (qj0  +  fc0  •  r*£T)2 

fi{a,c,8,b,T,  v,  k0,u0)  —  o2c2//3  +  a2  a2c2/^(c2  +  |t712)  c2  +  |{J]2 

c*/3tuj0  k0-reb  (^r  -f  6  •  t))(au0  +  /c0  •  rfln) 


/2(a,  c,8,b,T,v,k0,uj0)  = - + 


ac1/3  ac1/3(c2  +  |vl2) 


(33) 


(34) 

(35) 


Consider  the  spatio-temporal  CWT  of  the  moving  impulse  as  a  function  of  its  velocity,  v,  the  speed  parameter, 
c,  and  the  orientation  parameter,  0,  that  is, 

Sff(a,c,M,r).  (36) 

We  expect  to  obtain  a  maximum  of  this  expression  when, 


(37) 

(38) 


Figure  6a  shows  the  evaluation  of  equation  (36)  for  8  —  0,6  =  [1,0]',  r  =  1.  The  Morlet  wavelet  has  parameters 
k0  —  [6,0]'  and  u0  =  6.  The  maximum  is  located  around  c  =  vx  =  1-  Indeed  an  object  with  horizontal  velocity 
vx  =  1  gets  to  the  point  b  =  [1,0]'  at  r  =  1.  In  Figure  6b,  this  experiment  is  repeated  for  b  =  [0,2]'  and  r  -  1. 
In  this  case,  the  peak  is  found  around  c  =  vy  -  2. 


(a)  (b) 


Figure  6:  Spatio-temporal  CWT  of  a  moving  impulse  as  a  function  of  vx  (or  vv)  and  c  for  fixed  a,0,  b ,  and  r;  (a) 
5{?=[v. ,o)' (a  =  .l,c,0  =  0,5  =  [1,0]' ,r  =  1),  (b)  Sv-=10,vy]'(a  =  -  W  =  */2,&  =  [0,2]',r  =  1). 
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ABSTRACT 


Neutralizing  the  threat  of  an  incoming  ballistic  missile  is  a  difficult  task.  Often  the  missile  disintegrates 
leaving  the  warhead  surrounded  by  a  number  of  ballistic  fragments.  Among  these  fragments  only  the  warhead 
must  be  intercepted.  Thus  the  challenge  is  for  the  interceptor  to  identify  and  track  the  warhead  so  that  a 
successful  strike  can  be  achieved.  This  paper  addresses  the  problem  of  using  noisy  image  sequence  data  captured 
by  on-board  sensors  in  the  nose  cone  of  the  interceptor  to  detect  warheads  among  fragments.  Detection  and 
tracking  necessitates  the  extraction  of  reliable  motion  estimates,  which  is  often  difficult  when  the  data  sequence 
is  noisy.  We  propose  a  simple  algorithm  that  exploits  both  spatial  and  temporal  correlation  to  suppress  noise  in 
the  observed  image  sequence.  Our  algorithm  calculates  reliable  estimates  of  the  global  motion  caused  by  camera 
movement  and  uses  these  motion  estimates  to  reduce  significantly  the  noise  in  the  video  sequence. 

Keywords:  target  tracking,  noise  suppression,  affine  motion  estimation 


1  INTRODUCTION 


,  A  vanety  sim?le  block  “d  optical  flow  based  motion  estimation  algorithms  are  used  for  target  tracking  and 
detection  problems  m  scenarios  involving  imaging  sensors  in  high  speed  missiles.2  In  general,  optical  flow  methods 
have  an  advantage  m  accurately  representing  the  true  motion  at  the  expense  of  high  computational  complexity, 
while  simple  block-based  techniques  are  faster  but  do  not  handle  dilation  and  rotational  motion  found  in  high 
speed  missile  imagery.  In  either  case,  the  tracking  or  detection  problem  is  complicated  by  the  low  signal-to-noise 
ratios  that  are  characteristic  of  these  types  of  image  sequences.  As  a  result,  spatial  processing  in  often  applied  as 
a  preprocessing  step  to  suppress  noise.2 

We  propose  a  better  preprocessing  step  that  combines  temporal  information  with  spatial  processing  to  create 
a  spatio-temporal  noise  suppression  algorithm.  The  key  to  this  approach  is  using  an  affine  motion  model  to 
desipi  the  spatio-temporal  filter.  This  affine  motion  model  provides  more  flexibility  than  traditional  translation 
motion  models  while  at  the  same  time  requiring  less  computation  than  typical  pel-recursive  motion  estimation 
techniques.  In  addition  to  being  useful  for  noise  suppression,  the  parameters  of  the  affine  model  can  also  serve  as 
initial  estimates  in  subsequent  tracking  algorithms. 


Global  Estimation  Stage  Local  Estimation  Stage 

Figure  1:  Block  diagrams  of  ballistic  missile  tracking  algorithm 


A  block  diagram  of  the  proposed  noise  suppression  algorithm  is  shown  in  Figure  1.  This  preprocessing  step 
is  envisioned  to  be  an  important  component  in  the  complete  tracking  algorithm.  The  observed  image  sequence 
is  captured  by  the  interceptor  as  it  closes  in  on  the  target.  The  most  important  task  in  this  noise  suppression 
algorithm  is  accurate  estimation  and  representation  of  the  global  motion  created  by  the  movement  of  the  camera. 
Three  issues  are  of  primary  importance  in  designing  the  motion  estimation  procedure:  flexibility  in  the  motion 
model,  robustness  to  noise,  and  computational  complexity.  Each  of  these  issues  is  addressed  in  the  following 
sections. 


2  GENERALIZED  MOTION  MODEL 


Since  the  missile  approaches  the  debris  field  at  a  high  rate  of  speed,  the  model  used  to  represent  the  global 
motion  field  must  allow  rotation  and  scale  changes.  With  this  requirement  in  mind,  consider  a  generalized  motion 
model  that  allows  arbitrarily  complex  motion.  The  relationship  between  two  frames  at  time  t  and  t  +  St  is 
described  by: 

X{ni,n2,t)  =  X(ni  +  QTe(ni,n2),n2  +  ]3T0(n1,n2),t  +  (5t),  (1) 

where  A'-(nj,n2,t)  represents  the  video  signal  luminance  and  0(n,i,n2)  =  [0i(ni,n2)...0p(ni,n2)]r  and  $(ni,n2)  = 
[01(n1,n2)...<£?(ni,n2)]T  are  basis  functions  used  to  representthe  motion  field.  The  goal  of  the  motion  estimation 
algorithm  is  to  find  optimal  values  for  the  coefficients  a  and  /?.  A  detailed  description  of  the  parameter  optimiza¬ 
tion  algorithm  used  for  computing  ct  and  /3  has  been  previously  described  so  that  only  an  overview  is  presented 
below.3  The  iterative  algorithm  is  summarized  by  the  following  flowchart: 


1.  Initialization:  Set  iteration  counter  n  =  0  and  initialize  parameter  vectors  ao  and  j30.  Evaluate  error 
function  with  initial  parameter  vector  and  store  the  result. 
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Figure  2:  Model  for  observed  signal 


2.  Taylor  Series  Expansion:  Calculate  derivatives  at  I(x  +  0^0(2, y),y  +  y))V(:r, y)  €  S 

3.  Update:  Minimize  the  approximated  error  function  and  update  parameter  vectors  to  an+i  and  /3n+1. 

4.  Evaluation:  Evaluate  the  true  error  function.  Store  parameter  error  if  error  function  decreases. 

5.  Termination  test:  Evaluate  stopping  criterion.  Increment  n  and  return  to  step  2  if  convergence  not  yet 
reached. 


The  approximated  error  function  minimized  in  Step  3  of  the  algorithm  is  given  by: 

EE  W(m,n2)  (/(m.na.O  -  I(nun2,t  +  St)  -  ^U^laT0(nun2)  -  n2))  .  (2) 

Likewise,  the  basis  functions  are  selected  as  follows  to  allow  an  affine  transformation: 


0(ni,n2)  =  5(nlln2)  = 


1 

n  i 

n2 


(3) 


An  affine  function  was  chosen  in  this  case  because  it  meets  the  requirements  of  allowing  rotation  and  scale.  If  a 
more  complex  global  motion  field  is  expected,  the  same  parameter  estimation  algorithm  can  be  used. 


The  first  point  worth  noting  is  the  estimation  process  relies  on  the  image  gradient.  This  fact  increases 
sensitivity  to  noise,  which  is  clearly  undesirable.  We  overcome  this  limitation  by  applying  a  spatial  filter  described 
in  Section  3  to  all  images  before  attempting  to  estimate  global  motion  parameters.  Additional  robustness  is  also 
provided  by  using  the  entire  field  of  debris  when  estimating  parameters  rather  than  using  a  single  target.  Using 
the  entire  debris  field  implies  that  many  pixels  at  different  location  are  integrated  into  the  estimation  process. 
Another  point  worth  noting  about  the  estimate  algorithm  is  that  it  assumes  linear  variation  of  the  intensity 
function.  Again,  this  assumption  becomes  less  accurate  in  noisy  environments,  but  the  spatial  filter  applied  to 
each  image  tends  to  enforce  the  linearity  constraint. 


3  ROBUSTNESS  TO  NOISE 


The  parameter  estimation  procedure  presented  in  Section  2  degrades  in  the  presence  of  noise.  Fortunately,  a 
spatial  filter  can  be  used  to  reduce  the  noise  while  still  preserving  enough  of  the  original  signal  energy  to  calculate 
global  motion  parameters.  The  observed  video  sequence  is  modeled  as  shown  Figure  2.  Noise  is  added  to  the 
signal  after  the  original  image  sequence  is  blurred  with  the  knowm  point  spread  function  H{w i.ujj).  The  noise 
component  is  assumed  to  consist  of  uncorrelated  Gaussian  noise  with  a  known  variance  a2.  A  more  accurate 


Symbol 

Operation 

Tg 

Gradient 

T, 

Sample  image  on  non-integer  lattice 

Ta 

Arithmetic  operation  (add/multiply/compare) 

Ti 

Matrix  inverse 

Table  1:  Abbreviations  used  to  count  floating  point  operations 


noise  model  could  also  account  for  degradations  in  signal  quality  caused  by  clouds  or  other  variables.  The  final 
component  of  the  observation  model  is  a  quantizer  that  accounts  for  the  fact  that  the  sensor  has  finite  dynamic 
range  and  fixed  resolution. 


Given  these  assumptions,  an  appropriate  choice  for  the  initial  spatial  filter  is  a  Wiener  filter.  Since  the 
statistics  of  the  original  sequence  are  unknown,  the  filter  used  in  our  simulations  is  given  by: 


C7(tox,  w2) 


H'(vjuw2) 
|#(Wi,U>2)|2  +C72 


(4) 


This  low-pass  filter  simultaneously  accomplishes  the  goals  of  reducing  noise  and  imposing  linearity  on  the  lumi¬ 
nance  function.  Object  boundaries  will  be  blurred  with  this  spatial  filter,  but  the  global  motion  parameters  can 
still  be  estimated. 


4  COMPUTATIONAL  COMPLEXITY 


The  number  of  floating  point  operations  per  second  required  to  calculate  the  global  motion  parameters  using 
the  generalized  motion  model  is  estimated  in  this  section.  Computation  complexity  is  of  critical  importance  in 
the  ballistic  missile  setting  because  a  cost-effective  solution  that  can  handle  real  time  video  is  essential.  Since  the 
estimation  procedure  is  iterative,  the  first  obvious  question  concerns  convergence,  Although  convergence  cannot 
be  analytically  proven  with  this  approach,  our  experience  in  a  variety  of  settings  has  led  to  the  conclusion  that 
at  most  five  updates  are  needed  for  each  frame.  Another  important  issue  in  iterative  algorithms  is  finding  an 
‘appropriate  starting  point.  We  address  this  problem  by  initializing  all  parameters  to  zero  in  the  first  frame.  In 
subsequent  frames,  the  parameters  obtained  by  processing  the  previous  frame  serve  as  initial  estimates  in  the 
current  frame.  This  approach  eliminates  the  need  for  any  computation  to  be  used  in  finding  starting  points  in  each 
frame.  Table  1  defines  the  abbreviations  used  below  in  computing  the  total  number  of  operations.  A  conservative 
estimate  for  the  number  of  calculations  required  in  each  stage  of  the  algorithm  to  process  an  N  x  M  image  is 
given  as  follows: 


1.  Weight  function  calculation:  Many  pixels  in  the  image  do  not  provide  any  motion  information.  Furthermore, 
since  the  motion  estimation  algorithm  is  based  on  the  optical  flow  equation,  only  areas  around  edges  prove 
useful  for  estimating  motion/  Therefore,  we  generate  the  weight  function  by  setting  the  weight  to  zero 
for  any  pixel  whose  value  is  below  T\  or  any  pixel  whose  gradient  is  below  T2.  Use  of  this  weight  function 
drastically  reduces  the  computation  required  in  the  complete  algorithm.  The  complexity  required  to  compute 
this  mask  is  given  by  NM(Tg  4 ■  T0). 

2.  Error  function  evaluation:  In  order  to  evaluate  the  error  function,  the  previous  image  must  be  resampled  at 
the  positions  given  in  Equation  (1).  A  fast  scan-line  algorithm  could  be  used  to  speed  up  the  computation 
of  image  coordinates,  but  the  approximation  given  here  assumes  the  coordinates  are  computed  directly  for 
each  pixel  in  the  image.5  Therefore,  the  complexity  introduced  by  evaluating  the  error  function  is  given  by 
p{UTa  +  Ts)>  where  p  represents  the  number  of  non-zero  values  in  the  weight  matrix. 


3.  Parameter  update:  Updating  the  motion  parameters  for  each  iteration  requires  a  6  x  6  matrix  inverse.  In 
addition,  building  the  matrix  that  eventually  gets  inverted  requires  calculating  a  gradient  and  an  outer 
product  of  two  vectors  of  length  six.  This  requires  a  total  of  T,  +  p(Tg  +  48 T0)  operations. 

The  total  computation  for  five  iterations  can  now  be  written  as: 

NM(T9  +  Ta)  +  5(Tj  +p(T9  +  T,  +  59To)).  (5) 

This  expression  can  be  reduced  further  by  specifying  the  complexity  of  each  operation  in  terms  of  one  basic  unit. 
A  reasonable  set  of  simplifications  is  to  use  bilinear  interpolation  for  image  resampling  and  a  first-order  difference 
for  gradient  calculations.  A  non-optimized  bilinear  interpolation  operation  requiring  15  arithmetic  operations 
gives  values  of  T,  =  15Ta  and  Ts  =  16Ta.  Therefore  a  simplified  expression  for  the  floating  point  operations 
required  to  estimate  motion  parameters  at  a  rate  of  30  frames  per  second  is  given  by: 

150T{  +  (510NM  +  13500p)To.  (6) 

Clearly  the  value  of  p  should  be  as  small  as  possible  to  reduce  complexity.  In  our  simulations  ,the  image  size  is 
256  x  256  and  less  than  5%  of  the  pixels  are  used  to  compute  the  motion  parameters.  Neglecting  the  time  for 
matrix  inversion,  this  leads  to  a  computational  burden  of  approximately  (8  MFLOPS  to  process  30  frames  per 
second. 


5  SPATIO-TEMPORAL  FILTERING 


Once  the  global  motion  parameters  have  been  calculated,  this  information  can  be  used  to  reduce  the  noise  in 
the  sequence  by  combining  a  motion  compensated  filter  together  with  the  spatial  Wiener  filter  previously  used. 
Suppose  that  the  motion  parameters  have  been  estimated  and  stored  for  the  previous  N  frames.  If  the  "motion 
parameters  for  frame  i  are  represented  by  (oi,/3(),  then  Equation  (1)  can  be  recursively  applied  to  find  a  mapping 
between  the  current  frame  and  each  of  the  previous  N  frames.  For  example,  with  N  =  2  we  can  write: 

Ar[ni,n2,fc]  =  A'tn!  +  al’0(ni,n2)ln2 +^^(ni,n2),fc  -  1]  (7) 

A'[ni,n2,fc]  =  X^n1+Ql_ie(ni+ale{ni,n2),n2  +  pl^(n1,n2)),  (8) 

n2  +  +  Q*^(n1,n2),n2  +  pk  <£(nj  ,n2)),  k  -  2 1  . 

Assuming  the  global  motion  model  is  accurate,  the  mapping  functions  provide  a  way  to  obtain  N  additional 
observations  of  each  pixel.  Simple  averaging  of  all  observations  reduces  the  noise  variance  by  a  factor  of  N  4-  1. 
Further  reductions  in  noise  are  obtained  by  applying  a  spatial  filter  similar  to  Equation  (4).  The  resulting  spatio- 
temporal  filter  can  be  viewed  as  a  two-step  process  where  the  N  previous  frames  are  first  warped  to  align  objects 
present  in  each  of  the  frames.  These  warped  frames  are  then  averaged  and  spatially  filtered. 


6  RESULTS 


The  spatio-temporal  noise  suppression  scheme  was  tested  on  a  simulated  video  sequence.  The  simulated 
sequence  was  generated  using  the  Georgia  Tech  Signature  (GTSIG)  model  together  with  simulated  target  trajec¬ 
tories.  GTSIG  calculates  the  thermal  signatures  of  a  target  by  solving  a  system  of  differential  equations.  A  series 
of  sequences  was  generated  to  test  different  lighting  conditions  and  interceptor  velocities.  Figure  3a-b  shows  two 
original  frames  from  a  typical  sequence.  The  field  of  debris  is  composed  of  64  fragments  where  each  fragment  is 
the  projection  of  a  3-D  object  experiencing  3-D  motion  onto  the  2-D  image  plane.  The  degraded  images  observed 


at  the  sensor  are  show  in  Figure  3c-d.  These  images  are  produced  by  the  model  in  Figure  2  where  H{w\ .  W2 )  is  a 
Gaussian  blur  filter  and  the  noise  is  uncorrelated  Gaussian  noise.  The  resulting  SNR  for  the  observed  sequence 
is  4.3  dB  for  frame  240. 

Two  approaches  to  noise  suppression  are  shown  in  Figure  4.  The  top  images  show  the  result  of  applying  the 
Wiener  filter  given  in  Equation  (4)  to  each  of  the  observed  images.  Likewise,  the  bottom  images  show  the  result 
produced  by  the  spatio-temporal  noise  suppression  algorithm  where  three  frames  are  used  in  the  filter.  In  terms 
of  SNR,  both  filters  increase  the  SNR  by  3  dB.  The  ultimate  goal,  however,  is  detection  and  tracking  of  these 
objects.  This  goal  requires  the  objects  to  be  easily  distinguished  from  the  background  with  minimal  degradation 
in  shape.  Subjectively,  the  objects  are  easier  to  distinguish  from  the  background  and  and  outlines  are  more 
accurate  in  the  frames  processed  with  the  spatio-temporal  filter.  This  difference  is  objectively  demonstrated  in 
Figure  5  by  thresholding  each  of  the  filtered  images  to  generate  a  binary  object  mask.  The  threshold  was  selected 
for  each  image  to  be  as  low  as  possible  without  introducing  any  spurious  objects  into  the  background.  The  lower 
images  in  Figure  5  clearly  show  that  more  objects  are  successfully  identified.  In  addition,  objects  boundaries  of 
large  objects  are  more  accurate. 


7  CONCLUSIONS 


We  have  presented  a  spatio-temporal  noise  suppression  algorithm  intended  for  use  as  a  preprocessing  step 
in  ballistic  missile  tracking  and  detection  systems.  As  part  of  the  noise  reduction  algorithm  we  calculate  global 
motion  estimates  that  can  serve  as  initial  estimates  of  target  trajectories  in  subsequent  processing  stages.  While 
the  computation  required  to  implement  the  algorithm  can  be  reduced  further  by  using  a  multi-resolution  scheme, 
the  current  implementation  yields  good  results  with  a  computation  burden  of  approximately  80  MFLOPS,  Future 
research  will  apply  the  generalized  motion  estimation  algorithm  used  in  this  work  to  the  task  of  computing  local 
motion  information  around  targets  of  interest. 
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Original  Frame  200 


Original  Frame  240 


Figure  3:  Selected  frames  from  the  simulated  data  sequence  used  for  testing.  The  top  row  shows  the  original 
signal  while  the  bottom  row  shows  the  observed  signal  that  has  been  corrupted  by  a  blurring  distortion  and  noise. 


Wiener  filtered  images 


Spatio-temporal  filtered  images 


(c) 


(d) 


Figure  4:  Comparison  of  filtered  frames.  The  left  column  compares  results  at  frame  200  while  the  right  column 
compares  results  at  frame  240. 


Figure  5:  Comparison  of  threshold  applied  to  filtered  frames.  The  left  column  compares  results  at  frame  200 
•while  the  right  column  compares  results  at  frame  240. 


Appendix  D 


Introduction 


The  objective  of  this  project  is  the  development  of  an  effective  algorithm  for 
tracking  a  missile  warhead  when  missile  fragments  are  in  the  scene.  The  general 
approach  taken  is  to  study  the  problem  within  the  framework  of  the  Continuous 
Multidimensional  Wavelet  Transform  (CWT)  and  to  employ  principles  from 
multirate  signal  processing  to  achieve  efficient  implementations. 

A  substantial  effort  toward  CWT  tracking  is  in  progress.  This  part  of  the 
project  is  described  in  Appendices  A  and  B.  Supporting  the  CWT  tracking 
task  is  a  motion  estimation  and  modeling  component,  which  we  discuss  next. 
Motion  modeling  may  be  used  within  the  CWT  or  within  any  multiresolution 
framework.  This  is  envisioned  to  be  a  critical  part  of  the  end  product. 

Motion  estimation  refers  to  the  task  of  calculating  the  movement  of  objects 
through  a  series  of  images.  This  task  is  important  in  a  variety  of  applications 
such  as  video  coding,  computer  vision,  and  tracking.  In  general,  two  main 
approaches  are  used  for  the  task  of  motion  estimation.  Block-based  approaches 
assume  a  translational  motion  holds  for  every  pixel  in  a  given  region.  This 
assumption  leads  to  straightforward  computation  of  motion  parameters  with 
a  search  technique,  but  the  limitation  on  translational  motion  requires  small 
block  sizes  and  is  usually  only  accurate  for  short  periods  of  time.  In  the  second 
approach,  pel-recursive  techniques  allow  a  unique  motion  vector  for  each  pixel  in 
an  image.  In  order  to  make  this  approach  tractable,  neighborhood  constraints 
are  imposed  on  the  motion  field.  These  constraints  require  the  motion  field 
to  be  spatially  smooth.  Pel-recursive  techniques  can  represent  the  true  motion 
field  much  more  accurately  and  any  type  of  motion  is  possible.  Unfortunately, 
this  flexibility  comes  at  the  cost  of  high  computation  and  reduced  robustness  to 
noise.  We  use  a  combination  of  these  two  approaches  based  on  an  explicit  model 
of  the  motion  field.  Similar  to  block-based  approaches,  we  assume  the  movement 
of  pixels  within  a  patch  is  governed  by  the  same  model.  At  the  same  time,  the 
model  gives  a  unique  motion  vector  for  every  pixel  in  the  patch  like  pel-recursive 
based  techniques.  By  increasing  the  model  order,  more  complicated  motion  can 
be  represented.  The  advantages  of  this  approach  include  better  representation 
of  true  motion  fields  relative  to  block-based  approaches,  fewer  computations 
than  pel-recursive  techniques,  and  a  compact  representation  of  the  motion  field 
with  model  coefficients. 


General  Algorithm 

The  following  derivation  uses  arbitrary  basis  functions  to  show  how  a  motion 
model  of  arbitrary  complexity  can  be  used.  The  model  relating  the  current 
image  at  time  index  t  to  the  previous  image  is  given  by 
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(1) 


/  v  1 

X(nun2,t)  =  X  rii  +^Qj^(n1,n2),  n2  +  ^/3^i(ni1n2),  t-  1 

\  <=i  •  <=i 

where  AT(ni,n2,£)  represents  the  video  signal  luminance  and  ^(ni,n2)  and 
<£t(ni>n2)  are  basis  functions  used  to  represent  the  motion  field.  The  goal  of 
the  motion  estimation  algorithm  is  to  find  optimal  values  for  the  coefficients  a* 
and  /?,*.  In  order  to  solve  equation  (1)  we  make  two  simplifications.  First,  the 
model  coefficients  are  assumed  to  remain  constant  in  an  arbitrary  2D  region 
denoted  S.  Second,  the  expression  on  the  right-hand  side  of  equation  (1)  is 
approximated  by  a  first-order  Taylor  series  expansion  as  follows: 


X  (ni  +^a1-6,i(n1>n2),  n2  +  ^/Jt^t(ni,n2) 


»=i 


X(n} ,  n2)  +  — ^n2)  J2  E  -  "a)-  (2) 


5A'(ni,n2) 


Combining  these  simplifications  with  the  motion  model  leads  to  the  following 
error  function: 


EE 

[m,n2]  €  5 


W(ni,n2) 


^X(m ,n2,t)  -  X(nun2,t-  1)  - 


<9A"  (n2  ,n2) 

dni 


v 

E<*t0i(ni,n2) 


i—i 


(9Ar(ni,n2) 

3n2 


E 


Pi<t>i{ni,n2) 


2 


(3) 


where  P7(ni,Ti2)  is  a  weighting  function.  The  minimum  of  the  error  function  is 
found  by  forcing  the  derivative  with  respect  to  each  a »  and  Pi  to  zero.  As  long 
as  a  solution  exists  the  minimum  is  found  a  (p  +  q)  x  (p  -f  q)  matrix  inverse. 
The  accuracy  of  the  estimate  is  increased  by  changing  the  location  used  by  the 
Taylor  series  approximation  in  equation  (2)  from  (ni,n2)  to 


p  <? 

*=1  i=  1 

The  motion  parameters  a*  and  Pi  are  iteratively  updated  by  forcing  the  deriva¬ 
tive  of  the  error  function  in  expression  3  to  zero  and  changing  the  location  of 
the  Taylor  series  expansion.  This  process  can  be  repeated  until  either  an  error 
criterion  or  the  maximum  number  of  iterations  is  met.  For  the  case  of  an  affine 
model,  optimal  results  are  almost  always  reached  in  5  or  fewer  iterations.  An 
important  consideration  in  this  algorithm  is  the  choice  of  initial  starting  points. 
In  general  the  assumption  of  linear  variation  in  intensity  is  only  valid  in  a  small 


8 


neighborhood  around  each  pixel.  Consequently,  the  update  term  used  in  each 
iteration  of  the  algorithm  must  be  fairly  small.  This  fact  implies  that  the  start¬ 
ing  point  must  be  reasonable  good.  In  typical  video  sequences  this  restriction 
does  not  present  a  serious  problem  because  the  temporal  sampling  rate  is  high 
enough  to  allow  using  a  trivial  starting  point  where  all  parameters  are  zero.  In 
a  tracking  setting  a  better  choice  is  to  assume  the  motion  parameters  change 
slowly  over  time  so  that  previous  model  parameters  may  be  used  for  the  initial 
estimate  at  the  current  time.  This  prediction  based  approach  was  used  in  the 
initial  simulations. 


Simulations 

The  general  algorithm  was  used  to  track  objects  in  test  sequences  composed  of 
either  3  or  64  objects.  The  tracking  algorithm  consists  of  three  parts. 

1.  Registration:  Small  changes  in  position  of  the  camera  result  in  relatively 
large  translation  displacements  of  target  locations.  Registration  approx¬ 
imately  compensates  for  these  translation  displacements.  This  step  is 
important  for  future  processing  because  it  provides  a  fixed  origin  for  the 
motion  models. 

Registration  is  achieved  with  a  simple  algorithm  that  finds  the  centroid  of 
all  objects  in  the  image.  Once  the  centroid  is  located  the  entire  image  is 
shifted  so  that  the  centroid  moves  to  the  center  of  the  image. 

2.  Global  Motion  Estimation:  Part  of  the  apparent  motion  in  the  sequence 
is  caused  by  the  decreasing  distance  between  the  camera  and  the  targets. 
We  chose  to  use  an  affine  model  for  this  component  of  target  motion.  For 
this  case  p  =  q  =  3  and  the  basis  functions  are  given  by; 

Oi{nun2)  =  (ni  ,n2)  =  1 
®2(ni  ,n2)  =  02(ni,  ri2)  =  ni 
03(^1,  n2)  =  fa(nun2)  =  n2. 

This  component  of  the  motion  applies  to  all  the  targets  equally,  and  the 
entire  image  is  used  to  estimate  the  parameters.  Also,  since  the  images 
are  registered  in  step  1,  we  can  use  the  previously  estimated  global  mo¬ 
tion  parameters  as  an  initial  estimate  of  the  motion  parameters  relating 
the  current  frame  to  the  previous  frame.  Good  initial  estimates  reduce 
computation  and  produce  more  accurate  motion  estimates. 

3.  Local  Motion  Estimation;  Finally,  a  small  correction  in  motion  parameters 
is  made  for  any  targets  of  interest.  As  in  step  2  an  affine  model  is  used 
to  approximate  the  motion.  Target  selection  is  done  by  drawing  a  box 
around  the  desired  object  early  in  the  sequence.  Local  motion  parameters 
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are  assumed  to  be  valid  within  the  box  and  the  boundaries  of  the  box  are 
updated  in  each  frame  with  the  global  motion  parameters. 


Future  Work 

The  simulation  results  show  the  algorithm  can  reliably  track  the  objects  in  the 
distortion  free  simulated  sequence.  Further  study  is  needed  to  determine  if  the 
algorithm  is  appropriate  in  realistic  scenarios  where  images  are  degraded  by 
noise  and  blurring.  Since  the  algorithm  depends  on  reliable  image  gradients 
near  edges,  noise  will  be  the  most  difficult  problem.  To  address  this  problem,  a 
robust  gradient  estimation  algorithm  must  be  used. 

In  addition  to  performance  with  noisy  inputs,  an  important  consideration 
in  real  time  operation  is  the  computational  complexity.  Simplifications  to  the 
algorithm  that  reduce  complexity  will  be  explored  and  the  total  computational 
cost  in  terms  of  floating  point  operations  per  second  will  be  evaluated. 
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Figure  Summary  of  tracking  results. 


Linear-Prediction-Based  Motion  Estimation 


As  part  of  the  Ph.D.  thesis  of  Robbie  Armitano,  a  computationally  efficient 
block-matching  motion  estimation  algorithm  was  developed  using  linear  predici- 
ton.  The  idea  is  that  it  narrows  the  search  region  using  a  linear  prediction  of 
the  motion  vectors.  It  has  been  found  that  a  reliable  estimate  of  motion  cannot 
only  reduce  the  computational  complexity  of  motion  estimation,  but  can  also 
be  used  to  (1)  decrease  the  amount  of  side  information  required  to  transmit  mo¬ 
tion  vectors,  (2)  recover  lost  motion  vectors,  and  (3)  improve  the  performance 
of  motion  estimation  in  noisy  environments.  We  examined  the  use  of  linear 
prediction  in  the  context  of  motion  analysis  for  target  tracking,  In  particular, 
we  were  interesting  in  how  well  the  noise  resiliant  properties  of  the  technique 
working  in  the  tracking  problem. 

Linear  prediction  of  motion  vectors  is  possible  because  of  the  high  correlation 
between  motion  vectors  of  neighboring  blocks.  Because  of  the  high  motion- 
vector  correlation,  one  can  use  a  linear  combination  of  previously  computed 
motion  vectors  to  obtain  an  initial  guess  (a  seed)  for  the  estimation  of  the 
current  motion  vector,  Vb(0»  as  seen  *n  Eq.  (4),  The  biasing  of  the  search 
origin  reduces  the  ambiguity  in  the  motion-estimation  function,  minimizing  the 
search  area.  The  estimate  is  based  on  motion  vectors  in  a  neighborhood  around 
the  current  motion  vector  that  were  previously  computed  in  past,  current,  or 
future  frames.  Motion  vectors  that  are  highly  correlated  are  used  as  a  3-D  region 
of  support,  The  equation  for  the  predicted  motion  vector,  Vb(t)i  in  frame  t 
with  a  pth  order  prediction  filter  is  given  as 

(4) 

jt= i 

where  b  is  the  current  block,  t  is  the  current  frame  number,  {bjt.t*}  €  ¥,  and 
the  a*  are  the  predictor  coefficients. 

The  use  of  linear  prediction  to  estimate  a  seed  (an  initial  starting  point)  for 
the  motion-vector  search  is  advantageous  because  it  reduces  the  search  time  by 
reducing  the  number  of  redundant  searches.  It  imposes  continuity  from  block  to 
block  and  frame  to  frame.  This  forces  the  motion  vector  field  to  better  represents 
the  underlying  optical  flow  and  provides  resiliance  in  noisy  environments. 

The  linear  prediction  algorithm  performs  well  in  noisy-free  environments  as 
expected.  This  is  illustrated  by  the  example  shown  in  Figure  2.  It  shows  a  single 
frame  taken  from  our  three-object  test  sequence.  The  motion  vectors  obtained 
are  correct. 

It  can  handle  noise  well  at  a  level  of  10-20  percent,  but  breaks  down  beyond 
this  range.  Figure  2  shows  the  resulting  motion  vectors  for  our  tracking  test 
sequence  under  realistic  noise  conditions.  As  can  be  seen,  the  algorithm  derails. 
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Figure  2:  Motion  vectors  obtained  for  noisy  case. 
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Figure  2:  Motion  vectors  obtained  for  noisy  case. 


